Accurate assessment of crop yield is necessary for the use of resources in agroclimatic zones. This study emphasizes the need for an optimized approach for yield prediction utilizing the XGBoost machine learning algorithm using Indian agricultural datasets from the Kaggle library. The phases of data pre-processing, feature engineering, Bayesian hyperparameter optimization, and explanation analysis using SHAP make up the optimization approach. Performance measures like RMSE, MAE, and R2 have been used to assess a variety of machine learning methods, including linear regression, decision trees, random forests, and XGBoost. According to experimental data, XGBoost can beat other algorithms with low mistakes and an R2 value of 0.9666. Additionally, by determining the different weights of agricultural factors influencing crop production, the use of SHAP analysis was essential in increasing the model\'s accuracy. The accuracy of intelligent agricultural forecasts is anticipated to significantly increase with an optimized approach.
Introduction
Agriculture plays a vital role in India’s economy by supporting food security, rural development, and economic growth. Crop yield forecasting is essential because factors such as rainfall, irrigation, soil quality, temperature, season, and climate change strongly influence agricultural production. Traditional statistical forecasting methods have limitations in identifying complex relationships among agricultural, environmental, and climatic factors, creating a need for advanced machine learning approaches.
The study focuses on using an optimized XGBoost machine learning algorithm for crop yield prediction under Indian agroclimatic conditions. Machine learning, remote sensing, deep learning, and Explainable AI (XAI) techniques have improved agricultural forecasting by analyzing large amounts of soil, climate, satellite, and crop data. However, challenges such as overfitting, low interpretability, and poor generalization remain.
The proposed methodology uses a dataset of 19,689 agricultural samples containing crop type, year, season, state, cultivation area, production, fertilizer, pesticide usage, and yield information. The data is preprocessed through normalization, encoding, and duplicate removal. Feature engineering is applied to improve prediction accuracy. Several machine learning models, including Linear Regression, Decision Tree, Random Forest, and optimized XGBoost, are compared. Bayesian optimization is used to tune XGBoost hyperparameters, while SHAP analysis explains the influence of important features.
The experimental results show that the Optimized XGBoost model performs best, achieving:
RMSE: 163.6337
MAE: 19.3717
R² Score: 0.9666
This performance is better than Linear Regression, Decision Tree, and Random Forest models. SHAP analysis identifies major factors affecting crop yield, including production, crop type, land area, and state.
Conclusion
In this study, a more advanced and optimised version of the XGBoost algorithm for machine learning to predict crop yield intelligently from the Indian agricultural climate system is proposed. It is found from the experiments that the optimised XGBoost algorithm works excellently for predicting the crop yield intelligently, where the maximum R² score is 0.9666 with a minimal error rate. The Bayesian hyperparameter optimisation of the algorithm has helped to improve the performance of the algorithm and minimise the chances of overfitting. By implementing the SHAP values, the model\'s interpretability has been enhanced using the features of agriculture to affect crop productivity.
References
[1] A. Jabed, M. Azrifah, and A. Murad, “Heliyon Crop yield prediction in agriculture?: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability,” Heliyon, vol. 10, no. 24, p. e40836, 2024, doi: 10.1016/j.heliyon.2024.e40836.
[2] T. Van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning?: A systematic literature review,” Comput. Electron. Agric., vol. 177, no. July, p. 105709, 2020, doi: 10.1016/j.compag.2020.105709.
[3] S. Bisht and S. Nahar, “Crop Yield Prediction Accuracy Using XGBoost and Random Forest,” Int. J. Sci. Res. Eng. Trends, vol. 11, no. 3, pp. 1–6, 2025.
[4] And R. P. A. Kumar, I. Singh, M. Kashyap, A. Kumar, N. B. Devi, S. Singh, S. Sharma, “Integration of machine learning and remote sensing in crop yield prediction: A review,” Int. J. Res. Agron., vol. 8, no. 1S, pp. 549–562, Jan. 2025, doi: 10.33545/2618060x.2025.v8.i1sh.2496.
[5] S. Sah, D. Haldar, R. N. Singh, B. Das, and A. S. Nain, “Rice yield prediction through integration of biophysical parameters with SAR and optical remote sensing data using machine learning models,” Sci. Rep., vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-72624-4.
[6] M. Ashfaq, I. Khan, D. Shah, S. Ali, and M. Tahir, “Predicting wheat yield using deep learning and multi-source environmental data,” Sci. Rep., vol. 15, no. 1, pp. 1–20, 2025, doi: 10.1038/s41598-025-11780-7.
[7] A. M. S. Kheir et al., “Hybridisation of process-based models, remote sensing, and machine learning for enhanced spatial predictions of wheat yield and quality,” Comput. Electron. Agric., vol. 234, Jul. 2025, doi: 10.1016/j.compag.2025.110317.
[8] K. Jhajharia, P. Mathur, S. Jain, and S. Nijhawan, “ScienceDirect Procedia ScienceDirect Crop Yield Prediction using Machine Learning and Deep Learning Crop Yield Prediction using Techniques Machine Learning and Deep Learning Techniques,” Procedia Comput. Sci., vol. 218, pp. 406–417, 2023, doi: 10.1016/j.procs.2023.01.023.
[9] A. Yenkikar, V. Prakash, M. Bali, and T. Ara, “MethodsX An explainable AI-based hybrid machine learning model for interpretability and enhanced crop yield prediction ?,” MethodsX, vol. 15, no. June, p. 103442, 2025, doi: 10.1016/j.mex.2025.103442.
[10] N. Iqbal et al., “Analysis of Wheat-Yield Prediction Using Machine Learning Models under Climate Change Scenarios,” Sustain., vol. 16, no. 16, pp. 1–26, 2024, doi: 10.3390/su16166976.
[11] H. S. Chawla and D. Singh, “Development of a Machine Learning Model for Crop Yield Prediction in Agriculture,” The Bioscan, vol. 20, no. Supplement 2, pp. 827–832, 2025, doi: 10.63001/tbs. 2025. v20.i02.s2.pp827-832.
[12] B. A. Bhavika, G. Samaira, D. Kumari, and R. Kusum, “Predicting Annual Crop Yields in India ’ s States?: Leveraging XGBoost Techniques for a Web-Based Machine Learning Model,” ” Int. J. Res. Trends Innov., vol. 10, no. 3, pp. 140–146, 2025.
[13] Y. Dubey, A. Sakhare, A. Tasare, S. Kakad, and R. Umate, “Explainable Model for Agricultural Crop Yield Prediction in Indian Conditions with SHAP Analysis,” SSRG Int. J. Electron. Commun. Eng., vol. 12, no. 1, pp. 236–244, 2025 doi: 10.14445/23488549/IJECE-V12I1P118.
[14] S. Adlin Jebakumari and A. Jayanthiladevi, “AgriYield-ML: Enhancing Agricultural Productivity through Machine Learning: A Model for Accurate Crop Yield Prediction,” J. Eng. Sci., vol. 53, no. 5, pp. 155–169, Sep. 2025, doi: 10.21608/jesaun.2025.367985.1450.
[15] A. S. Menon, J. Aravinth, R. Sankaran, and P. Kiran, “Smart Agricultural Technology Deep learning-based farm-level crop yield prediction using multi-temporal satellite data for complex engineering application,” Smart Agric. Technol., vol. 12, no. August, p. 101562, 2025, doi: 10.1016/j.atech.2025.101562.
[16] Anshumish, “Crop Yield Data with Soil and Weather Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/anshumish/crop-yield-data-with-soil-and-weather-dataset.
[17] D. De Clercq and A. Mahdi, “Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis and remote sensing data,” Agric. Syst., vol. 220, 2024, doi: 10.1016/j.agsy.2024.104099.
[18] K. P. S. Attwal, “Integrated machine learning model for wheat yield prediction using agronomic and meteorological factors: A case study from Punjab, India,” Int. J. Agric. Food Sci., vol. 7, no. 8, pp. 385–400, 2025, doi: 10.33545/2664844x.2025.v7.i8f.636.
[19] J. Lu et al., “Estimation of rice yield using multi-source remote sensing data combined with crop growth model and deep learning algorithm,” Agric. For Meteorol., vol. 370, no. January, p. 110600, 2025, doi: 10.1016/j.agrformet.2025.110600.
[20] S. Sarode, P. Sharma, P. Tidke, and N. Panghate, “Crop Yield Prediction Using Machine Learning,” Lect. Notes Electr. Eng., vol. 1270, no. 4, pp. 335–343, 2025, doi: 10.1007/978-981-97-7876-8_30.